18 research outputs found

    Parallel border tracking in binary images for multicore computers

    Get PDF
    [EN] Border tracking in binary images is an important operation in many computer vision applications. The problem consists in finding borders in a 2D binary image (where all of the pixels are either 0 or 1). There are several algorithms available for this problem, but most of them are sequential. In a former paper, a parallel border tracking algorithm was proposed. This algorithm was designed to run in Graphics Processing units, and it was based on the sequential algorithm known as the Suzuki algorithm. In this paper, we adapt the previously proposed GPU algorithm so that it can be executed in multicore computers. The resulting algorithm is evaluated against its GPU counterpart. The results show that the performance of the GPU algorithm worsens (or even fails) for very large images or images with many borders. On the other hand, the proposed multicore algorithm can efficiently cope with large images.Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature. This work has been partially supported by the Spanish Ministry of Science, Innovation, and Universities, jointly with the European Union, through Grants RTI2018-098085-BC41, PID2021-125736OB-I00 and PID2020-113656RB-C22 (MCIN/AEI/10.13039/501100011033/, "ERDF A way of making Europe"). Also, the GVA has partially supported this research through project PROMETEO/2019/109.García Mollá, VM.; Alonso-Jordá, P. (2023). Parallel border tracking in binary images for multicore computers. The Journal of Supercomputing. 79:9915-9931. https://doi.org/10.1007/s11227-023-05052-2991599317

    Updating/downdating the NonNegative Matrix Factorization

    Full text link
    This is the author’s version of a work that was accepted for publication in Journal of Computational and Applied Mathematics. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Journal of Computational and Applied Mathematics 318 (2017) 59–68. DOI 10.1016/j.cam.2016.11.048.The Non-Negative Matrix Factorization (NNMF) is a recent numerical tool that, given a nonnegative data matrix, tries to obtain its factorization as the approximate product of two nonnegative matrices. Nowadays, this factorization is being used in many science fields; in some of these fields, real-time computation of the NNMF is required. In some scenarios, all data is not initially available and when new data (as new rows or columns) becomes available the NNMF must be recomputed. Recomputing the whole factorization every time is very costly and not suitable for real time applications. In this paper we propose several algorithms to update the NNMF factorization taking advantage of the previously computed factorizations, with similar error and lower computational cost. © 2016 Elsevier B.V. All rights reserved.This work has been partially supported by EU together with Spanish Government through TEC2015-67387-C4-1-R (MINECO/FEDER), by Generalitat Valenciana through PROMETEOII/2014/003 and by Programa de FPU del Ministerio de Educacion, Cultura y Deporte FPU13/03828 (Spain). We want to thank Dr. Pedro Vera and his team (University of Jaen) for providing us with their music analysis software.San Juan Sebastián, P.; Vidal Maciá, AM.; García Mollá, VM. (2016). Updating/downdating the NonNegative Matrix Factorization. Journal of Computational and Applied Mathematics. 318:59-68. https://doi.org/10.1016/j.cam.2016.11.048S596831

    Analysis of an efficient parallel implementation of active-set Newton algorithm

    Full text link
    [EN] This paper presents an analysis of an efficient parallel implementation of the active-set Newton algorithm (ASNA), which is used to estimate the nonnegative weights of linear combinations of the atoms in a large-scale dictionary to approximate an observation vector by minimizing the Kullback¿Leibler divergence between the observation vector and the approximation. The performance of ASNA has been proved in previous works against other state-of-the-art methods. The implementations analysed in this paper have been developed in C, using parallel programming techniques to obtain a better performance in multicore architectures than the original MATLAB implementation. Also a hardware analysis is performed to check the influence of CPU frequency and number of CPU cores in the different implementations proposed. The new implementations allow ASNA algorithm to tackle real-time problems due to the execution time reduction obtained.This work has been partially supported by Programa de FPU del MECD, by MINECO and FEDER from Spain, under the projects TEC2015-67387- C4-1-R, and by project PROMETEO FASE II 2014/003 of Generalitat Valenciana. The authors want to thank Dr. Konstantinos Drossos for some very useful mind changing discussions. This work has been conducted in Laboratory of Signal Processing, Tampere University of Technology.San Juan-Sebastian, P.; Virtanen, T.; García Mollá, VM.; Vidal Maciá, AM. (2018). Analysis of an efficient parallel implementation of active-set Newton algorithm. The Journal of Supercomputing. 75(3):1298-1309. https://doi.org/10.1007/s11227-018-2423-5S12981309753Raj B, Smaragdis P (2005) Latent variable decomposition of spectrograms for single channel speaker separation. In: Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA 2005), New Paltz, NyBertin N, Badeau R, Vincent E (2010) Enforcing harmonicity and smoothness in Bayesian non-negative matrix factorization applied to polyphonic music transcription. IEEE Trans Audio Speech Lang Process 18(3):538–549Dikmen O, Mesaros A (2013) Sound event detection using non-negative dictionaries learned from annotated overlapping events. In: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA 2013). New Paltz, NYLawson CL, Hanson RJ (1995) Solving least squares problems. Society for Industrial and Applied Mathematics, PhiladelphiaVirtanen T (2007) Monaural sound source separation by nonnegative matrix factorization with temporal continuity and sparseness criteria. IEEE Trans Audio Speech Lang Process 15(3):1066–1074Virtanen T, Gemmeke J, Raj B (2013) Active-set Newton algorithm for overcomplete non-negative representations of audio. IEEE Trans Audio Speech Lang Process 21(11):2277–2289Cemgil AT (2009) Bayesian inference for nonnegative matrix factorisation models. Comput Intell Neurosci 2009:785152Cichocki A, Zdunek R, Phan AH, Amari S (2009) Nonnegative matrix and tensor factorizations. Wiley, New YorkMATLAB (2014) The Mathworks Inc., MATLAB R2014B, Natnick MATuomas Virtanen, Original MATLAB implementation of ASNA algorithm. http://www.cs.tut.fi/~tuomasv/software.htmlCarabias-Orti J, Rodriguez-Serrano F, Vera-Candeas P, Canadas-Quesada F, Ruiz-Reyes N (2013) Constrained non-negative sparse coding using learnt instrument templates for realtime music transcription. Eng Appl Artif Intell 26:1671–1680San Juan P, Virtanen T, Garcia-Molla Victor M, Vidal Antonio M (2016) Efficient parallel implementation of active-set newton algorithm for non-negative sparse representations. In: 16th International Conference on Computational and Mathematical Methods in Science and Engineering (CMMSE 2016), Rota, SpainJuan P San, Efficient implementations of ASNA algorithm. https://gitlab.com/P.SanJuan/ASNAOpenMP v4.5 specification (2015). http://www.openmp.org/wpcontent/uploads/openmp-4.5.pdfGemmeke JF, Hurmalainen A, Virtanen T, Sun Y (2011) Toward a practical implementation of exemplar-based noise robust ASR. In: Signal Processing Conference, 19th European, IEEE, pp 1490–149

    Low-complexity soft ML detection for generalized spatial modulation

    Full text link
    [EN] Generalized Spatial Modulation (GSM) is a recent Multiple-Input Multiple-Output (MIMO) scheme, which achieves high spectral and energy efficiencies. Specifically, soft-output detectors have a key role in achiev-ing the highest coding gain when an error-correcting code (ECC) is used. Nowadays, soft-output Maxi-mum Likelihood (ML) detection in MIMO-GSM systems leads to a computational complexity that is un-feasible for real applications; however, it is important to develop low-complexity decoding algorithms that provide a reasonable computational simulation time in order to make a performance benchmark available in MIMO-GSM systems. This paper presents three algorithms that achieve ML performance. In the first algorithm, different strategies are implemented, such as a preprocessing sorting step in order to avoid an exhaustive search. In addition, clipping of the extrinsic log-likelihood ratios (LLRs) can be incor-porating to this algorithm to give a lower cost version. The other two proposed algorithms can only be used with clipping and the results show a significant saving in computational cost. Furthermore clipping allows a wide-trade-off between performance and complexity by only adjusting the clipping parameter.Acknowledgements This work has been partially supported by Spanish Ministry of Science, Innovation and Universities and by European Union through grant RTI2018-098085-BC41 (MCUI/AEI/FEDER) , by GVASimarro, MA.; García Mollá, VM.; Martínez Zaldívar, FJ.; Gonzalez, A. (2022). Low-complexity soft ML detection for generalized spatial modulation. Signal Processing. 196:1-12. https://doi.org/10.1016/j.sigpro.2022.10850911219

    Generalization of the K-SVD algorithm for minimization of ß-divergence

    Full text link
    [EN] In this paper, we propose, describe, and test a modification of the K-SVD algorithm. Given a set of training data, the proposed algorithm computes an overcomplete dictionary by minimizing the ß-divergence () between the data and its representation as linear combinations of atoms of the dictionary, under strict sparsity restrictions. For the special case , the proposed algorithm minimizes the Frobenius norm and, therefore, for the proposed algorithm is equivalent to the original K-SVD algorithm. We describe the modifications needed and discuss the possible shortcomings of the new algorithm. The algorithm is tested with random matrices and with an example based on speech separation.This work has been partially supported by the EU together with the Spanish Government through TEC2015-67387-C4-1-R (MINECO/FEDER) and by Programa de FPU del Ministerio de Educacion, Cultura y Deporte FPU13/03828 (Spain).García Mollá, VM.; San Juan-Sebastian, P.; Virtanen, T.; Vidal Maciá, AM.; Alonso-Jordá, P. (2019). Generalization of the K-SVD algorithm for minimization of ß-divergence. Digital Signal Processing. 92:47-53. https://doi.org/10.1016/j.dsp.2019.05.001S47539

    Soft MIMO detection through sphere decoding and box optimization

    Full text link
    [EN] Achieving optimal detection performance with low complexity is one of the major challenges, mainly in multiple-input multiple-output (MIMO) detection. This paper presents three low-complexity Soft-Output MIMO detection algorithms that are based mainly on Box Optimization (BO) techniques. The proposed methods provide good performance with low computational cost using continuous constrained optimization techniques. The rst proposed algorithm is a non-optimal Soft-Output detector of reduced complexity. This algorithm has been compared with the Soft-Output Fixed Complexity (SFSD) algorithm, obtaining lower complexity and similar performance. The two remaining algorithms are employed in a turbo receiver, achieving the max-log Maximum a Posteriori (MAP) performance. The two Soft-Input Soft-Output (SISO) algorithms were proposed in a previous work for soft-output MIMO detection. This work presents its extension for iterative decoding. The SISO algorithms presented are developed and compared with the SISO Single Tree Search algorithm (STS), in terms of efficiency and computational cost. The results show that the proposed algorithms are more efficient for high order constellation than the STS algorithm.Simarro, MA.; García Mollá, VM.; Vidal Maciá, AM.; Martínez Zaldívar, FJ.; Gonzalez, A. (2018). Soft MIMO detection through sphere decoding and box optimization. Signal Processing. 145:48-58. https://doi.org/10.1016/j.sigpro.2017.11.010S485814

    Multi-GPU adaptation of a simulator of heart electric activity

    Full text link
    [EN] The simulation of the electrical activity of the heart is calculated by solving a large system of ordinary differential equations; this takes an enormous amount of computation time. In recent years graphics processing unit (GPU) are being introduced in the field of high performance computing. These powerful computing devices have attracted research groups requiring simulate the electrical activity of the heart. The research group signing this paper has developed a simulator of cardiac electrical activity that runs on a single GPU. This article describes the adaptation and modification of the simulator to run on multiple GPU. The results confirm that the technique significantly reduces the execution time compared to those obtained with a single GPU, and allows the solution of larger problems.[ES] La simulación de la actividad eléctrica del corazón se calcula mediante la resolución de un gran sistema de ecuaciones diferenciales ordinarias, que necesita una enorme cantidad de tiempo de computación. Sin embargo, en los últimos años se están introduciendo, en el ámbito de la computación de alto rendimiento, las unidades de procesamiento gráfico (GPU). Estos potentes dispositivos han atraído a grupos de investigación que requieren simular la actividad eléctrica del corazón. El grupo de investigación que firma este artículo ha desarrollado un simulador de actividad eléctrica cardíaca que se ejecuta en una sola GPU. En este artículo se describe la adaptación y modificaciones de dicho simulador para su ejecución en múltiples GPU. Los resultados confirman que la técnica empleada permite reducir sensiblemente los tiempos de ejecución respecto a los que se obtienen con una sola GPU, además de permitir afrontar problemas mucho más grandes.Este trabajo ha sido financiado por la Universitat Politècnica de València a través de su Programa de Apoyo a la Investigación y Desarrollo (PAID-06-11) y (PAID-05-12), por la Generalitat Valenciana a través de los proyectos PROMETEO/2009/013 y Ayudas para la realización de proyectos de I+D para grupos de investigación emergentes GV/2012/039, y por el Ministerio Español de Economía y Competitividad y el fondo europeo de Desarrollo Regional (FEDER) de la Comunidad Europea través del proyecto TEC2012-38142-C04.García Mollá, VM.; Vidal Maciá, AM.; Liberos Mascarell, A.; Climent, AM. (2013). Adaptación para multiples GPU de un simulador de actividad eléctrica en el corazón. Revista Cubana de Ciencias Informáticas. 7(4):100-111. http://hdl.handle.net/10251/39802S1001117

    Parallel signal detection for generalized spatial modulation MIMO systems

    Full text link
    [EN] Generalized Spatial Modulation is a recently developed technique that is designed to enhance the efficiency of transmissions in MIMO Systems. However, the procedure for correctly retrieving the sent signal at the receiving end is quite demanding. Specifically, the computation of the maximum likelihood solution is computationally very expensive. In this paper, we propose a parallel method for the computation of the maximum likelihood solution using the parallel computing library OpenMP. The proposed parallel algorithm computes the maximum likelihood solution faster than the sequential version, and substantially reduces the worst-case computing times.This work has been partially supported by the Spanish Ministry of Science, Innovation and Universities and by the European Union through grant RTI2018- 098085-BC41 (MCUI/AEI/FEDER), by GVA through PROMETEO/2019/109, and by RED 2018-102668-T. Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature.García Mollá, VM.; Simarro, MA.; Martínez Zaldívar, FJ.; Boratto, M.; Alonso-Jordá, P.; Gonzalez, A. (2022). Parallel signal detection for generalized spatial modulation MIMO systems. The Journal of Supercomputing. 78(5):7059-7077. https://doi.org/10.1007/s11227-021-04163-y7059707778

    Parallel Implementation Strategies for MIMO ID-BICM Systems

    Full text link
    [EN] One of the current techniques proposed for multiple transmit and receive antennas wireless communication systems is the use of error control coding and iterative detection and decoding at the receiver. These sophisticated techniques produce a significant increase of the computational cost and require large computational power. The use of modern computer facilities as multicore and multi-GPU (Graphics Processing Unit) processors can decrease the computational time required, representing a promising solution for the receiver implementation in these systems. In this paper we explain how iterative receivers can improve the performance of suboptimal detectors. We also introduce a novel parallel receiver scheme based on a hybrid computing model where CPUs and GPUs work together to accelerate the detection and decoding steps; this design comes to exploit the features of the GPU NVIDIA Kepler architecture respect to the previous one in order to optimize the communication system performance.This work has been partially funded by PROMETEO/2009/013 project of Generalitat Valenciana, projects TEC2009-13741 of the Ministerio Español de Ciencia e Innovación, TEC2012-38142-C04 of the Ministerio Español de Economía y Competitividad, and PAID-05-2011 of Universitat Politècnica de València.Simarro Haro, MDLA.; Ramiro Sánchez, C.; Martínez Zaldívar, FJ.; Vidal Maciá, AM.; González Téllez, A.; Piñero Sipán, MG.; García Mollá, VM. (2013). Parallel Implementation Strategies for MIMO ID-BICM Systems. Waves. 5-13. http://hdl.handle.net/10251/57906S51

    The impact of GPU/Multicore in Signal Processing: a quantitative approach

    Get PDF
    [EN] This paper presents a meaningful practical performance comparison between the last generation of Graphics Processing Units (GPUs) and the last generation multi-core CPUs when they are used to solve given Signal Processing algorithms. Two kinds of tests were considered: when GPU pre-designed computational libraries were available, and when the GPU code was developed by the authors. Results show that GPUs offer great possibilities, but its programming is still hard and high performances can be obtained only if the algorithm can be adapted to the GPU programming model.This work was financially supported by the Spanish Ministerio de Ciencia e Innovación (Projects TIN2008-06570-C04-02, TEC2009-13741 and CAPAP-H3 TIN2010-12011-E), Universitat Politècnica de València through “Programa de Apoyo a la Investigación y Desarrollo (PAID-05-10)” and Generalitat Valenciana through project PROMETEO/2009/013.García Mollá, VM.; Gonzalez, A.; González García, CY.; Martínez Zaldívar, FJ.; Ramiro Sánchez, C.; Roger Varea, S.; Vidal Maciá, AM. (2011). The impact of GPU/Multicore in Signal Processing: a quantitative approach. Waves. (3):96-106. http://hdl.handle.net/10251/47425S96106
    corecore